Documentation Index
Fetch the complete documentation index at: https://mintlify.com/FrankDevg/imbd_scrapper_project/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Network components provide proxy management and Tor IP rotation capabilities for anonymous web scraping.
ProxyProvider
Implements ProxyProviderInterface to dynamically select and configure proxy connections.
Class Definition
from domain.interfaces.proxy_interface import ProxyProviderInterface
from typing import Optional, Dict
class ProxyProvider(ProxyProviderInterface):
def __init__(self):
self.current_proxy: Optional[Dict[str, str]] = None
Source: infrastructure/network/proxy_provider.py:11-22
Supported Proxy Types
- Authenticated Custom Proxy - Username/password authentication
- Tor Network - SOCKS5 proxy via Tor
- Proxy List - Random selection from configured list
- Direct Connection - No proxy
Methods
get_proxy
Returns proxy configuration based on priority order.
def get_proxy(self) -> Optional[Dict[str, str]]
Proxy configuration dictionary for requests library, or None for direct connection.
Source: infrastructure/network/proxy_provider.py:24-58
Priority Order:
- Custom authenticated proxy (if
config.USE_CUSTOM_PROXY)
- Tor network (if
config.USE_TOR)
- Random proxy from list (if
config.PROXY_LIST exists)
- Direct connection (no proxy)
Example:
proxy_provider = ProxyProvider()
proxy = proxy_provider.get_proxy()
# With Tor enabled:
# {'http': 'socks5h://127.0.0.1:9050', 'https': 'socks5h://127.0.0.1:9050'}
# With custom proxy:
# {'http': 'http://user:pass@proxy.example.com:8080',
# 'https': 'http://user:pass@proxy.example.com:8080'}
# Direct connection:
# None
get_proxy_location
Queries the public IP, city, and country of the current proxy.
def get_proxy_location(self) -> tuple[str, str, str]
Tuple of (public IP, city, country). Returns ('N/A', 'N/A', 'N/A') on error.
Source: infrastructure/network/proxy_provider.py:60-82
Uses: ipinfo.io service to query geographic information.
Example:
ip, city, country = proxy_provider.get_proxy_location()
print(f"IP: {ip}, Location: {city}, {country}")
# Output: IP: 185.220.101.45, Location: Amsterdam, NL
TorRotator
Implements TorInterface to control Tor network and rotate exit IPs.
Class Definition
from stem import Signal
from stem.control import Controller
from domain.interfaces.tor_interface import TorInterface
class TorRotator(TorInterface):
def __init__(self):
self.control_port = config.TOR_CONTROL_PORT
self.wait_time = config.TOR_WAIT_AFTER_ROTATION
self.max_retries = config.MAX_RETRIES
self.proxy = config.TOR_PROXY
self.host = config.TOR_HOST
Source: infrastructure/network/tor_rotator.py:13-28
Configuration
config.TOR_CONTROL_PORT = 9051 # Tor control port
config.TOR_WAIT_AFTER_ROTATION = 5 # Seconds to wait after rotation
config.MAX_RETRIES = 3 # Max rotation attempts
config.TOR_PROXY = { # SOCKS5 proxy config
'http': 'socks5h://127.0.0.1:9050',
'https': 'socks5h://127.0.0.1:9050'
}
config.TOR_HOST = "tor" # Docker service name
Methods
get_current_ip
Retrieves the current Tor exit IP without rotation.
def get_current_ip(self) -> str
Current Tor public IP address, or empty string on error.
Source: infrastructure/network/tor_rotator.py:30-40
Example:
tor = TorRotator()
current_ip = tor.get_current_ip()
print(f"Current Tor IP: {current_ip}")
# Output: Current Tor IP: 185.220.101.45
rotate_ip
Rotates the Tor circuit and returns the new exit IP.
def rotate_ip(self) -> str
New public IP after rotation, or original IP if rotation fails.
Source: infrastructure/network/tor_rotator.py:61-87
Process:
- Gets current IP
- Sends NEWNYM signal to Tor control port
- Waits for configured time
- Verifies new IP is different
- Retries up to
max_retries times
Example:
tor = TorRotator()
original_ip = tor.get_current_ip()
print(f"Original IP: {original_ip}")
new_ip = tor.rotate_ip()
print(f"New IP: {new_ip}")
# Output:
# [TOR] IP original antes de rotar: 185.220.101.45
# [TOR] Enviando señal NEWNYM (Intento 1/3)
# [TOR] Rotación exitosa: 185.220.101.45 → 198.98.51.189
# New IP: 198.98.51.189
_send_newnym
Internal method to send NEWNYM signal to Tor control port.
def _send_newnym(self) -> bool
True if signal sent successfully, False on connection error.
Source: infrastructure/network/tor_rotator.py:42-59
Uses: stem library to communicate with Tor controller.
Docker Integration
Tor Service
Requires Tor running in Docker container:
services:
tor:
image: dperson/torproxy
ports:
- "9050:9050" # SOCKS5 proxy
- "9051:9051" # Control port
environment:
- TOR_ControlPort=9051
Connection from App
import socket
tor_ip = socket.gethostbyname("tor") # Resolves Docker service name
with Controller.from_port(address=tor_ip, port=9051) as controller:
controller.authenticate()
controller.signal(Signal.NEWNYM)
Source: infrastructure/network/tor_rotator.py:47-53
Error Handling
Proxy Selection Errors
if not config.USE_CUSTOM_PROXY and not config.USE_TOR and not config.PROXY_LIST:
logger.warning("[PROXY] No se encontró proxy configurado. Usando conexión directa.")
return None
Source: infrastructure/network/proxy_provider.py:52-54
Tor Connection Errors
try:
with Controller.from_port(address=tor_ip, port=self.control_port) as controller:
controller.authenticate()
controller.signal(Signal.NEWNYM)
return True
except Exception as e:
logger.error(f"[TOR] No se pudo conectar al puerto de control de TOR: {e}")
return False
Source: infrastructure/network/tor_rotator.py:51-59
IP Rotation Failures
for attempt in range(self.max_retries):
if not self._send_newnym():
return original_ip # Stop retrying on connection failure
new_ip = self.get_current_ip()
if new_ip and new_ip != original_ip:
logger.info(f"[TOR] Rotación exitosa: {original_ip} → {new_ip}")
return new_ip
logger.warning("[TOR] No se logró rotar la IP después de todos los intentos.")
return original_ip
Source: infrastructure/network/tor_rotator.py:71-87
Usage Example
from infrastructure.network.proxy_provider import ProxyProvider
from infrastructure.network.tor_rotator import TorRotator
import requests
# Initialize components
proxy_provider = ProxyProvider()
tor_rotator = TorRotator()
# Get proxy configuration
proxy = proxy_provider.get_proxy()
print(f"Using proxy: {proxy}")
# Check location
ip, city, country = proxy_provider.get_proxy_location()
print(f"Location: {city}, {country} ({ip})")
# Rotate Tor IP
if proxy:
old_ip = tor_rotator.get_current_ip()
new_ip = tor_rotator.rotate_ip()
print(f"IP changed: {old_ip} → {new_ip}")
# Make request with proxy
response = requests.get(
"https://www.imdb.com/chart/top/",
proxies=proxy,
timeout=10
)
print(f"Response status: {response.status_code}")
Security Considerations
Proxy Credentials: Never commit proxy usernames/passwords to version control. Use environment variables:import os
config.PROXY_USER = os.getenv("PROXY_USER")
config.PROXY_PASS = os.getenv("PROXY_PASS")
Tor Anonymity: While Tor provides anonymity, scraping behavior can still be detected. Use appropriate delays and request patterns.